HTML Basics: Terminology

Introduction

Tags, content and presentation

In its most basic form, an HTML document consists of text, enclosed in tags. These tags (more accurately, these elements) describe the meaning of the text they contain, rather than how the enclosed text should be displayed. This concept is called content-based markup, as opposed to presentational markup.

Content-based markup allows device independence; knowing the meaning of a piece of text allows a browser to render it as good as possible on the platform it is running on. With presentational markup this is impossible. Without knowing why a string of text must be displayed in red 20 points Helvetica, you can't pick a good alternative way to display it on a screen where this font isn't available.

Using tags

An element, when used in a document, consists of an opening and a closing tag. The closing tag is not always used. It might be optional, or even forbidden. The group of elemens which have opening and closing tags are referred to as container elements, and the group of elements without closing tag as empty elements. Container tags may not overlap each other. Always close the innermost container first, if you are nesting them.

An opening tag can have certain attributes. These provide extra information about the tag and the text they enclose, if any. For example, the A tag has an HREF attribute which defines where the anchored text is a link to.

The attribute may have a value, although this is not necessary in all cases. If it has a value, it is specified in the "name=value" form. The value must be enclosed in quotes if it contains anything more than letters, digits, hyphens and/or periods. In all other cases, quoting is optional. The maximum length for an attribute value is 1024 characters, including the quotation marks (if used).

The generic structure

The document can be divided in two parts, the head and the body. The document head provides information about the document, for example its title, the author and a short description (there's a separate section on using the document head in the HTML Basics series). The document body holds the actual contents of the document.

Building the body - blocks

The document body is built up with so-called block elements or block-level tags. A block element marks up a section of text and assigns it a particular meaning. For example, you can indicate that a section of text is a heading, a large quotation or an item in a list. There are also block elements which may only contain other block elements and no text. These elements include lists (which may only contain list items) and tables (which may only contain table rows full of cells). Some block elements may contain other block elements, instead of only text. These are sometimes referred to as super-block elements.

Block elements which may not contain text are used to hold certain block elements together, so they form a logical unity. A list is a good example of this; it groups all the list items inside together, so the browser knows the items are part of the same list. A slightly more complex example is the table. An HTML table is built up by rows of cells, and the table tag itself contains an optional caption, followed by one or more rows. The rows may only contain header or data cells, and the cells themselves may contain almost every element.

Special cases

A super-block element assigns a meaning to a set of block elements. The division tag, DIV, is probably the best example. It can be used to set a default alignment or style attributes for all the block elements it contains. This is easier to do than setting that property for each block element inside.

A special case is the preformatted text container. It is the only container in which linebreaks and spacing is used exactly as how it appears inside the source. This is very useful if you are inserting ASCII art, or text which requires a specific layout and spacing, for example the source for a program.

Text - adding the contents

Inside the block elements, the actual text is found. This text should be written only with characters in the ISO Latin 1 character set. In HTML, spaces and newlines are considered identical. They are referred to as whitespace, and if multiple whitespace elements are used in sequence, the browser should display only one whitespace element.

Depending on the block, the text inside it may also be marked up. In general, the text-level tags used for this can be divided into three categories:

Appearance (font tags), which change the appearance of the text.
Logical (phrase tags) which assign text a particular meaning.
Special tags, which assign text a particular functionality.

Appearance/font tags

Font tags are used to change the appearance of the text. This includes font size changes, boldface, italics and super/subscript. However, if a browser can't perform the appearance change, it has no good way to determine a good alternative. As said above, without knowing why this font change should be performed, the browser can't pick another way to display/process the text. A search engine can't know something in italics is a book title unless you tell it.

This limitation can cause problems if your document depends on this appearance change. There is no guarantee or requirement that a browser will display a font tag in the way the name suggests.

Logical markup

It's not always necessary to use a font tag. Often the change in appearance is an attempt to assign a special meaning to the text. For example, italics is often used for citations or emphasized text. In these cases, a better approach is to use a logical tag to indicate this meaning. The browser can now pick the best way to display that kind of text on the screen.

For example, if the browser does not support italics, it can still display citations and emphasized text correctly, although probably in a different fashion.

Special markup

The third category, special tags, does assign meaning or appearance change to text, but functionality instead. The most common example is the hyperlink, which assigns a connection to another document to the enclosed text. Inline images also fall in this category.

Strangely enough, the Wilbur specification also include the FONT tag in this group, although it is clearly an appearance tag. The three building blocks for HTML forms (INPUT, TEXTAREA and SELECT) are also text-level tags, and can be grouped in the "Special" category.

A final note

In almost all cases, you can use each text-level tag inside another one, even when this doesn't make sense. There is no way to prevent this in the specification, so it's up to the author to use only meaningful constructs. If a meaningless construct is used (such as, for example, <EM><INPUT TYPE=radio NAME=foo></EM>), you can get unexpected results if a browser tries to render it.

Reference index ~ HTML Basics index ~ Feedback